On this page

Flare-On 12 Challenge 5

- min read

5. ntfsm

We are given a binary named ntfsm.exe. Running it prints:

PS C:\Users\user\Desktop\Flare Writeup\5> .\ntfsm.exe
usage: ./ntfsm <password>
to reset the binary in case of weird behavior: ./ntfsm -r
PS C:\Users\user\Desktop\Flare Writeup\5> .\ntfsm.exe 123
input 16 characters

So it expects a 16 character password. The -r option is interesting, "reset the binary in case of weird behavior" suggests it's stateful and probably storing something between runs.


Let's check its behavior in Procmon to see how and where it stores that state.

Procmon ADS files

We can see the program is opening these interesting looking files: ntfsm.exe:state, ntfsm.exe:position, and ntfsm.exe:transitions. These are in fact not regular files but are NTFS Alternate Data Streams. This means the program can write its state into these streams and then read them back the next time it runs. The data in those streams persists on the filesystem, so the binary effectively gets a save state embedded inside itself. The name of the streams is interesting too: state, position, and transitions suggest this is some kind of finite state machine.


Another thing we can observe is the large file size: 20 MB seems too large for a simple console application. Let's check in PE Bear where it's coming from.

PE Bear sections

17 MB comes from the .text section alone! That's unusually large. To get a good starting point for reversing this, let's try to enter some password and see if we can work backwards from its behavior.

PS C:\Users\user\Desktop\Flare Writeup\5> .\ntfsm.exe 1234567812345678
wrong!

This time Procmon also shows that the program creates more instances of itself! Apart from this "wrong!" message, we get several cmd.exe instances popping up, followed by these message boxes:

Hello there, hacker

If the "wrong!" string exists in the binary, as opposed to being constructed during runtime, that would be a perfect starting point for static reversing.

IDA xrefs to "wrong!"

As luck would have it, it's there. It has a single xref to 0x14000C421. Trying to decompile the function at that address didn't go well, as IDA complained about the size of the function, so we will continue with the disassembly view.

                jmp     loc_14000C593
; ---------------------------------------------------------------------------

loc_14000C421:                          ; CODE XREF: sub_14000C0B0+153↑j
                lea     rcx, aWrong     ; "wrong!\n"
                call    sub_140002829

Looks like a puts call or a wrapper. The preceding instruction is an unconditional jump, so this block is entered via an explicit branch. IDA is kind enough to add a comment next to the label, showing us which jump targets this block. Let's rename this block and follow the xref.

.text:000000014000C1EB 
cmp     [rsp+59398h+var_8E8], 10h
jnz     loc_14000C593
cmp     [rsp+59398h+var_8E0], 10h
jnz     wrong_password
lea     rcx, aCorrect   ; "correct!\n"
call    sub_140002829

So we require the local stack variables var_8E8 and var_8E0 to both equal 0x10. Let's rename these req1 and req2 respectively and continue. To find what req1 is, we can check its xrefs. A particularly interesting reference is at 0x14000C182, where req1's address is loaded as an argument to a function:

lea     rdx, [rsp+59398h+req1]
mov     rcx, [rsp+59398h+var_8C8]
call    sub_1400014A1

This call is actually just a jump stub to a function:

__int64 __fastcall sub_140FF0FE0(__int64 a1, void *req1)
{
  const CHAR *v2; // rax
  HANDLE hFile; // [rsp+48h] [rbp-80h]
  __int64 v5; // [rsp+60h] [rbp-68h]
  _BYTE v6[40]; // [rsp+68h] [rbp-60h] BYREF
  _BYTE v7[40]; // [rsp+90h] [rbp-38h] BYREF

  v5 = sub_14000166D(v6, a1);
  sub_1400036A7(v7, v5);
  v2 = (const CHAR *)sub_1400014B5(v7);
  hFile = CreateFileA(v2, 0x80000000, 0, 0, 3u, 0x80u, 0);
  if ( hFile == (HANDLE)-1LL )
  {
    sub_140001947(v7);
    sub_140001947(a1);
    return 0;
  }
  else
  {
    ReadFile(hFile, req1, 8u, 0, 0);
    CloseHandle(hFile);
    sub_140001947(v7);
    sub_140001947(a1);
    return 1;
  }
}

So req1 gets its value from a ReadFile call. We can find out where it's reading from by placing a breakpoint in x64dbg at the call leading to this function, stepping until the CreateFileA call, and examining its arguments. This reveals the "file" to be ntfsm.exe:position. Let's rename the sub_140FF0FE0 function to load_file and continue.


What about the second variable, req2?

IDA xrefs to req2

86k references! And that's with the analysis stopped before completion, as letting it run would eventually crash IDA. Most of the xrefs lead to blocks similar to these:

case_2:                                 ; CODE XREF: sub_14000C0B0+B23↑j
                mov     [rsp+59398h+var_668], 0C413h
                mov     rax, [rsp+59398h+req2]
                inc     rax
                mov     [rsp+59398h+req2], rax
                jmp     short loc_14000CC5B
; ---------------------------------------------------------------------------

case_1:                                 ; CODE XREF: sub_14000C0B0+B1C↑j
                mov     [rsp+59398h+var_668], 0C414h
                mov     rax, [rsp+59398h+req2]
                inc     rax
                mov     [rsp+59398h+req2], rax
                jmp     short loc_14000CC5B
; ---------------------------------------------------------------------------

case_3:                                 ; CODE XREF: sub_14000C0B0+B2A↑j
                mov     [rsp+59398h+var_668], 0C415h
                mov     rax, [rsp+59398h+req2]
                inc     rax
                mov     [rsp+59398h+req2], rax
                jmp     short loc_14000CC5B

Some constant is moved to var_668, req2 is incremented, and execution jumps to 0x14000CC5B. The blocks are reached from a switch statement located just above:

loc_14000CB7E:                          ; CODE XREF: sub_14000C0B0+9AA↑j
                rdtsc                   ; jumptable 000000014000CA5A case 25064
                shl     rdx, 20h
                or      rax, rdx
                mov     [rsp+59398h+var_680], rax

loc_14000CB8F:                          ; CODE XREF: sub_14000C0B0+B0C↓j
                rdtsc
                shl     rdx, 20h
                or      rax, rdx
                mov     [rsp+59398h+var_678], rax
                mov     rax, [rsp+59398h+var_680]
                mov     rcx, [rsp+59398h+var_678]
                sub     rcx, rax
                mov     rax, rcx
                cmp     rax, 12AD1659h
                jl      short loc_14000CB8F
                movzx   eax, [rsp+59398h+var_59368]
                mov     [rsp+59398h+var_5935C], al
                cmp     [rsp+59398h+var_5935C], 4Ah ; 'J'
                jz      short case_1
                cmp     [rsp+59398h+var_5935C], 50h ; 'P'
                jz      short case_2
                cmp     [rsp+59398h+var_5935C], 53h ; 'S'
                jz      short case_3
                jmp     short default_case

Since we know we are dealing with a finite state machine, we can make an educated guess that these are the transition tables of the FSM. var_668 is probably the state of the FSM. req2 might indicate the number of correct transitions. IDA has also marked the beginning of the block as jumptable case, so these targets are presumably selected based on the state of the FSM. Let's verify!

                cmp     cs:qword_14132D480, 0FFFFFFFFFFFFFFFFh
                jnz     short loc_14000CA20
                mov     cs:qword_14132D480, 0

loc_14000CA20:                          ; CODE XREF: sub_14000C0B0+963↑j
                mov     rax, cs:qword_14132D480
                mov     [rsp+59398h+var_660], rax
                cmp     [rsp+59398h+var_660], 1629Ch
                ja      loc_140C6847C
                lea     rax, cs:140000000h
                mov     rcx, [rsp+59398h+var_660]
                mov     ecx, ds:(jpt_14000CA5A - 140000000h)[rax+rcx*4] ; switch 65535 cases
                add     rcx, rax
                jmp     rcx             ; switch jump

Looks like a global variable is used to calculate the jump index. Following its xrefs, we find that qword_14132D480 is an argument to the function load_file, the one we found earlier:

location_1400BFD7:
                lea     rdx, qword_14132D480
                mov     rcx, [rsp+148h+var_118]
                call    load_file

Similarly, we can place a breakpoint at the CreateFileA call in the function to see where it's reading from. This time we find it's ntfsm.exe:state. Seems like our guess was correct, the state of the FSM is the index of the jump table.


We can also confirm our guess about the functions in the jumptable. Since the initial state is 0, we can place a breakpoint at the first function, 0x140860241, and verify that: (1) the breakpoint is hit, and (2) the switch case is indeed using the first character from our input. What about subsequent states? The transition map of state 0 is as follows:

cmp     byte ptr [rsp+3BB8Ch], 4Ah ; 'J' - Sets var_668 to 2
jz      short loc_1408602CE
cmp     byte ptr [rsp+3BB8Ch], 55h ; 'U' - Sets var_668 to 3
jz      short loc_1408602EF
cmp     byte ptr [rsp+3BB8Ch], 69h ; 'i' - Sets var_668 to 1

If we pass an input starting with J we should expect execution to reach the transition function of state 2: 0x1401F4F57. But this never happens. Looks puzzling at first but recall that: (1) the binary creates more instances of itself, and (2) the ADS shows it's stateful. Looking at the imports we indeed see a CreateProcessA import, with a single reference at 0x14000B0CD. We can make a (reasonable) guess that each instance of ntfsm.exe processes a single transition step in the FSM. This too can be verified with a debugger.


Let's break at the CreateProcessA call at 0x14000B0CD, and modify the dwCreationFlags argument to launch the process in a suspended state using the CREATE_SUSPENDED flag (0x4).

BOOL CreateProcessA(
  [in, optional]      LPCSTR                lpApplicationName,
  [in, out, optional] LPSTR                 lpCommandLine,
  [in, optional]      LPSECURITY_ATTRIBUTES lpProcessAttributes,
  [in, optional]      LPSECURITY_ATTRIBUTES lpThreadAttributes,
  [in]                BOOL                  bInheritHandles,
  [in]                DWORD                 dwCreationFlags,
  [in, optional]      LPVOID                lpEnvironment,
  [in, optional]      LPCSTR                lpCurrentDirectory,
  [in]                LPSTARTUPINFOA        lpStartupInfo,
  [out]               LPPROCESS_INFORMATION lpProcessInformation
);

On the x64 fastcall, the 6th parameter would be at: [rsp + 0x28] (shadow store of 32 bytes + 8). Now we can open a new x64dbg instance and attach a debugger to the new instance of ntfsm.exe:

x64dbg attach debugger to new ntfsm.exe instance

We can place a breakpoint at the transition function of state 2 (0x1401F4F57), go to the threads tab and resume the main thread:

x64dbg resume thread

And our breakpoint will be hit. Stepping to the transition switch shows that it's matching against the 2nd character of our input too!


Let's recap what we have found so far.

  • To reach the "correct!" message, we need:
    • req1, which is a value read from ntfsm.exe:position, to be 0x10.
    • req2, which is incremented in the FSM transition functions, to be 0x10.
  • Each instance of ntfsm.exe processes a single transition in the FSM and then spawns the next instance.
  • A global variable qword_14132D480 reads its value from ntfsm.exe:state. That variable is then used to calculate an index in a jump table.
  • The functions in the jump tables are transition functions for the states of the FSM.
  • If any of the cases of the transition functions are matched:
    • req2 is incremented.
    • A constant is moved into var_668, the new state of the FSM.

req1 will be satisfied once the FSM has processed all 16 characters. If we assume there is only 1 correct password, this means that either our observation that req2 is incremented each match is wrong, or that there must be transition functions which do not match any characters. We could verify this by manually walking a valid path. Doing so reveals that req2 is indeed incremented throughout the path, but the transition function of the last character breaks the pattern, and doesn't match any characters.

x64dbg transition function of the last character

So to find a valid password, all we have to do is find a path whose last state's transition function matches at least 1 character. We can make several observations:

  1. Inside each transition function, the accepted transitions show up as a sequence of byte comparisons against the next input character: cmp byte ptr [rsp+disp32], imm8, followed by a conditional jump.
  2. The imm8 operand in that cmp is the character the FSM expects at this position.
  3. At the jump target, the first instruction sets the new FSM state: mov qword ptr [rsp+58D30h] (our var_668).

With that knowledge we can write a Python script that runs a DFS algorithm to find the correct password:

import pefile, struct
from capstone import Cs, CS_ARCH_X86, CS_MODE_64, CS_GRP_JUMP
from capstone.x86_const import X86_OP_MEM, X86_OP_IMM, X86_REG_RSP

exe="ntfsm.exe"; BASE=0x140000000; CONST=0x00C687B8; DISP_STATE=0x58D30; MAXLEN=16

pe=pefile.PE(exe, fast_load=True); blob=open(exe,"rb").read()
IB=pe.OPTIONAL_HEADER.ImageBase
off=lambda va: pe.get_offset_from_rva(va-IB)
u32=lambda va: struct.unpack_from("<I", blob, off(va))[0]
bs =lambda va,n: blob[off(va):off(va)+n]

md=Cs(CS_ARCH_X86, CS_MODE_64); md.detail=True
jt_va=lambda x: BASE+CONST+4*x

def dfs(state=0, s="", vis=None):
    if vis is None:
        vis = set()
    if state in vis:              # cycle detected -> kill this path
        return
    vis.add(state)

    if len(s) == MAXLEN:          # reached accepting length
        print(s)
        vis.remove(state)
        return

    # transition function of the current state
    found = BASE + u32(jt_va(state))

    # only need the first ~0x80 bytes because transitions are clustered there
    insns = list(md.disasm(bs(found, 0x80), found))

    transitions = []

    # collect all valid cmp/jcc pairs for branching DFS
    for i in range(len(insns) - 1):
        cmpi, jcc = insns[i], insns[i + 1]

        # cmp byte ptr [rsp+disp32], imm8  -> imm8 is the candidate char
        if cmpi.mnemonic != "cmp":
            continue
        o = cmpi.operands
        if not (len(o) == 2 and
                o[0].type == X86_OP_MEM and
                o[0].mem.base == X86_REG_RSP and
                o[0].size == 1 and
                o[1].type == X86_OP_IMM):
            continue

        # next instruction must be a conditional jump (jcc target)
        if not (jcc.group(CS_GRP_JUMP) and
                jcc.mnemonic.startswith("j") and
                jcc.mnemonic != "jmp"):
            continue

        # character = imm8 from the cmp
        chv = o[1].imm
        ch  = chr(chv) if 32 <= chv < 127 else "."

        # follow the conditional jump to get next state
        target = jcc.operands[0].imm

        # disassemble again at the target
        # first instruction is expected to be:
        # mov qword ptr [rsp+58D30h], X
        tgt_insn = next(md.disasm(bs(target, 0x20), target), None)
        if not tgt_insn or tgt_insn.mnemonic != "mov":
            continue

        to = tgt_insn.operands
        if not (len(to) == 2 and
                to[0].type == X86_OP_MEM and
                to[0].mem.base == X86_REG_RSP and
                to[0].mem.disp == DISP_STATE and
                to[1].type == X86_OP_IMM):
            continue

        # X from the mov is the next FSM state
        nxt = to[1].imm
        transitions.append((ch, nxt))

        if len(transitions) == 3:
            break

    for ch, nxt in transitions:
        dfs(nxt, s + ch, vis)

    vis.remove(state)

dfs(0)

The script finds a valid password and we get the flag!

PS C:\Users\user\Desktop\Flare\5> py solver.py
iqg0nSeCHnOMPm2Q
PS C:\Users\user\Desktop\Flare\5> .\ntfsm.exe iqg0nSeCHnOMPm2Q
correct!
Your reward: [email protected]
PS C:\Users\user\Desktop\Flare\5>